-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document branch relaxation #58
base: main
Are you sure you want to change the base?
Conversation
|
||
Unconditional branches are implemented by the `j(al)?r?` pseudoinstructions. | ||
(The underlying instructions are `jalr?`.) | ||
The `j(al)?` targets can be any symbol or address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it really be any symbol/address? I haven't tried them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Any” is indeed an overstatement, since the call/tail macros have a maximum displacement of roughly 2 GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll read up on this and improve it.
|
||
Conditional branches are implemented by the `b(l|g)(t|e)(z|u)?` and `b(eq|ne)z?` pseudoinstructions. | ||
(The underlying instructions are `b(lt|ge)u?` and `b(eq|ne)`.) | ||
Again, the targets can be any symbol or address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same as above.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, except there’s a 1 MiB limit.
I just saw that LLVM folks (including @jrtc27 and @asb) are pushing back against LLVM supporting conditional branch relaxation in assembly programs: https://reviews.llvm.org/D108961 It seems that I misunderstood that conditional branch relaxation is actually a feature of the RISC-V assembly language. I assumed it was --- that this was part of why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw that LLVM folks (including @jrtc27 and @asb) are pushing back against LLVM supporting conditional branch relaxation in assembly programs: https://reviews.llvm.org/D108961
It seems that I misunderstood that conditional branch relaxation is actually a feature of the RISC-V assembly language. I assumed it was --- that this was part of why
beqz
et al. are considered to be pseudo-instructions --- which motivated me to file this PR against riscv-asm-manual. If the community disagrees, then this PR is inappropriate and should be closed. (And my assembly codes are "broken", and should be rewritten.)
Yes, the question is whether this is something that should be relied upon as a standard feature in RISC-V assembly or is an implementation-specific extension.
1: | ||
``` | ||
|
||
The `bnez` is further relaxed to `bne`, while `j` is relaxed to `jal` with a relocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not relaxations. These are pure aliases for specific forms (bne with a zero register, and jal with a zero register).
A compiler can and should be computing instruction sizes and branch offsets, and then emitting the correct branch instructions. But this is an unreasonable request for hand written assembly code. The only reasonable approach is if the assembler can do this for the programmer. GNU Binutils has been doing assembler branch relaxation isince the 1990s at least, and maybe earlier, with the Motorola 68000 port perhaps being the first one. And note that assembler branch relaxation is a differenent process from linker relaxation. Assembler branch relaxation may increase the size of code, and does not involve any relocations. Assembler branch relaxation is to let the assembly language programmer use the obvious simple branch instruction, and the assembler figures out how to translate that into actual valid target instructions. For CISC machines, this is usually choosing between short and long forms of the branch instruction. For RISC machine, this usually means emitting one or multiple instructions as necessary. There are dozens of GNU Binutils targets that have branch relaxation support. The RISC-V port works roughly the same as the MIPS port here. Though the MIPS port has extra complications due to delay slots and the branch likely bit, and hence can emit longer sequences than the RISC-V port does. The LLVM approach is that relaxation is OK as an optimization, but should not be required for correct operation. I think this rule should only apply to linker relaxation. It should not apply to assembler branch relaxation, as this is necessary for humans to write readable assembly language programs. |
MIPS is a terrible example to use, its assembly language is full of pitfalls ( Referring to ancient architectures like the 68000 isn't great either, the world is a very different place to then. The best comparison points are generally other contemporary architectures like AArch64, where, to the best of my knowledge, there is no such equivalent, even though it also has smaller immediates for its conditional branches than its unconditional ones. Yes, that rule is about linker relaxation. It's unfortunate that the same term is used for opposites. I disagree that it's necessary for readable assembly language programs though; it's rare that it ever matters, and in the case that it does I would argue that it is more surprising that the branch gets relaxed, since the assembly as written does not correspond to the disassembly, and only in certain edge-cases. |
(and for hand-written assembly it's really rather easy: you ignore the problem entirely, or it doesn't even occur to you, and on the off-chance you end up writing something where a branch target is out of range you see the error and fix your code) |
Then later you modify the code again and the branch is in range again, and now your code is unnecessarily larger and slower because there is no easy way to notice when a branch falls back in range again. Hence it is best if the assembler does this for you. You don't like m68000 and MIPS as examples. How about x86? GNU as will automatically rewrite branch instructions for you, but since this is a CISC, it is a choice between various short and long forms of branches. ARMv7 conditional branches have 24-bits of offset. RISC-V conditional branches have 12-bits of offset. 24-bits is enough. You would have to write contrived code to exceed that. But 12-bits is not enough, and it is easy to write code that breaks. Hence ARMv7 does not need branch relaxation, but RISC-V does. |
I don't have the experience to feel comfortable taking a stand in this argument. I'd just like to share my team's use-case in case it helps color the discussion. Or perhaps someone can suggest a better approach. We developed an assembly-code generator that emits tiled loop nests (from numerical methods) with various degrees of unrolling, and we want to use more compact branches where possible. The amount of unrolling is determined dynamically, based on how the generator is invoked. Currently the generator is designed to emit the simplest branches, which the GNU assembler handily expands as needed. Switching to LLVM, and observing different behavior, prompted an internal bug report (and this PR), which recently resulted in the aforementioned LLVM patch. Conservatively using longer branches hurts performance unacceptably, so "fixing my code" means implementing code-size computation in my code generator. I acknowledge that writing code generators involves reinventing a lot of compiler wheels, but this was one I hoped to not have to, especially teaching my code generator how to detect compressible instructions and to expand things like |
Yeah that can technically happen if you're on the edge. But that's so unlikely to happen, normally you're either way under or way over. And if you were happy to rewrite your code to use a long form the first time then clearly code size or performance wasn't a concern for that sequence as otherwise you should have rewritten it to avoid the issue. Plus I could make a similar argument that, with the binutils behaviour, you silently increase code size and instruction count rather than warn the developer that they might want to write their code in a more efficient manner, which makes things worse as you can't look at the assembly and know what the instruction count is going to be.
That's the key difference. It doesn't invert the condition and add a new instruction, it just uses a different form of the same instruction, so the disassembly still matches what you wrote. Code size goes up a bit but instruction count does not, which, unless you're thrashing in your I-cache/ITLB or are trying to squeeze your code into the smallest ROM possible, is the more important thing.
I spoke of AArch64 not Armv7, where you only get a 19+2-bit immediate (19 encoded bits, plus 2 implied), and RISC-V has 13 bits since there's an implied 0, not 12 bits, though neither change things all that much. If you're writing code that exceeds a 13-bit offset within a function though I do start to question why on earth you're writing in assembly, that should be such a rare case. |
@nick-knight did offer a legitimate (and not exactly rare) counterexample. |
The discussion had so far been about hand-written assembly, not machine-generated (which isn't writing assembly, it's generating it). I agree it's a useful datapoint, and perhaps suggests that a sensible approach would be to support the feature but have it be behind an off-by-default option. That way you avoid the surprise of writing one branch instruction and getting two, despite the instruction you wrote being a valid RISC-V instruction save for the immediate range, but still provide a way to support people who are hand-writing assembly that is up against the branch range limits and don't want to manually expand them, and minimalistic code generators that don't want to count instructions. I'd still rather it weren't supported at all, but I doubt I'd be able to get consensus on that... |
If I recall correctly, binutils for X86 does have support for emitting two instructions prior to the 386. The Jcc opcodes with more than 1 bytes of displacement were added in the 386. Of course that code isn't really relevant these days, but it is a historical example. |
GAS for Xtensa also supports branch relaxation (on by default, with ways of disabling it), along with a few other nifty assembly-level relaxations: |
Just to clarify my position, I was mainly stating that I overall wish RISC-V ASM had adopted less "magic", though in general I go for matching binutils behaviour wherever possible. If GCC or other generators are producing code that needs this, I think it does make sense for LLVM to support it. Given that the ship has already sailed on RISC-V assembly being somewhat magic, a |
I think that m68k and x86 (one insn) cannot be used as an example for RISC-V (more than one insn) to follow. For m68k jeq .L0
...
.L0: The assembler may pick either the 2-byte One can think of I agree that we should avoid more magic and look ahead instead of looking behind. We don't necessarily use a magic directive like je.d32 .L0 # deprecated
{disp32} je .L0 # pseudo prefix
.L0: We can think of an assembler notation to make the user intention explicit.
I think it's still viable that the GNU as behavior remains an implementation-specific extension. |
This won’t come as a surprise, but I’m on the side of standardizing the GNU behavior, because it’s the pragmatic thing to do. I regard this “magic” terminology as both pejorative and aloof. First of all, there’s nothing magic about it… we all understand it. Second, we clearly have compelling use cases for it. Purity is a virtue but isn’t the only one. This x86 debate is a distraction. The fact is, RISC-V’s GNU port isn’t novel in this regard, but it’s true that it is more aggressive than most, because RISC-V benefits from this scheme disproportionately because of the constants involved. |
There's nothing inherently pejorative about magic. The term is used in many situations, not always negative. In this case the main point of the term here is not to say it's inherently bad but that it carries with it some amount of surprise.
We the ISA specifiers and toolchain developers do. I've never been concerned with whether you or I can understand it, because clearly everyone here does. But we understand all kinds of details and gotchas that your average user does not. My concern is with the average software developer who has been tasked with writing some amount of RISC-V assembly and doesn't have the deep understanding we do; that they will see a BEQ in the instruction set manual, write a BEQ in their source and not see a BEQ in the disassembly when they come to debug the thing but a BNE followed by a J(AL X0). That is certainly surprising to most people and may lead to confusion, but how much is unknown.
This is where we differ. It's clearly a use case, but the debate is over (a) how compelling that is (b) whether that is sufficient to warrant the issues caused by additional surprise to inexperienced developers. You are correct that I and some others here take a more purist approach to these things, but the difference in viewpoints should be regarded as a good thing, since it ensures that, whatever the conclusion of the discussion is, both sides of the debate have been well represented and that the conclusion isn't just "LLVM must blindly follow binutils" (the conclusion may still be to standardise the current binutils behaviour, and maybe even because the pragmatic approach of providing compatibility outweighs other concerns, but at least it will have been done after properly assessing things). |
That's generally my position too.
I see what you mean and I didn't mean to come across that way. I apologise. |
I appreciate this healthy debate. I would expect to see a similar tension between programmer luxuries and behavioral simplicity in the design of any programming language. I would like to tease apart two aspects of this debate: whether the language offers support for branch relaxation, and whether assembler mnemonics like My use-case, which I acknowledge may not be compelling to everyone, is for reliable support of branch relaxation. However, I'm perfectly willing to use special mnemonics/pseudoinstructions/directives/etc. to achieve this behavior. Of course, such a compromise will inevitably require changes to binutils, meaning practical issues of doing engineering work, deprecating features, or breaking compatibility. If we do not compromise, then it seems we will move toward having GNU and LLVM dialects of RISC-V assembly. Users (like me) of the GNU features will simply vote with our feet. |
A possible compromise is to create new mnemonics for the relaxable branches, e.g. instead of relaxing beq we could support a new name jeq which is relaxable, and beq is just a beq. However, I would worry about coordination with the architecture review team. I don't think that there are any software people on it, and there is no guarantee that they won't steal jeq from us later on. There is also the problem that there is a large existing base of RISC-V assembly language that assumes that beq is relaxable, and finding and fixing all of the code is not practical. So GCC would probably have to continue to relax beq and handle jeq same as beq, but llvm could only relax jeq, and people that want to use llvm instead of gcc can fix their code to use jeq instead of beq as they find problems. |
In general, I feel that:
|
The toolchain currently implements various branch relaxations, but this behavior is undocumented. I think this is something the assembly programmer should be aware of.